TST[string]: update expecteds for using_string_dtype to fix xfails #61727

jbrockmendel · 2025-06-27T22:59:38Z

It isn't 100% obvious that the new repr for Categoricals is an improvement, but it's non-crazy. One of the remaining xfails one is for eval(repr(categorical_index)) round-tripping that won't be fixable unless we revert back to the old repr behavior.

I'm pretty sure that the fix in test_astype_dt64_to_string is correct and the test is just wrong, but merits a close look.

That leaves 12 xfails, including the one un-fixable round-trip one that we'll just remove. Of those...

test_join_on_key i think is surfacing an unrelated bug that I'll take a look at
test_to_dict_of_blocks_item_cache is failing because we don't make series.values read-only for ArrowStringArray. I think @mroeschke can comment on how viable/important that is.
test_string_categorical_index_repr is about CategoricalIndex repr that span multiple lines; with the StringDtype the padding is changed.
4 in pandas/tests/io/json/test_pandas.py that im hoping @WillAyd can take point on
test_to_string_index_with_nan theres a MultiIndex level that reprs with a nan instead of NaN. Not a huge deal but having mixed-and-matched nans/NaNs in the repr is weird.
test_from_records_sequencelike i don't have a good read on
tests.base.test_misc::test_memory_usage is skipped instead of xfailed, but the reason says that it "doesn't work properly" for arrow strings which seems xfail-adjacent. Instead of skipping can we update the expected behavior cc @jorisvandenbossche ?

(Update: looks like I missed one in test_http_headers and another in test_fsspec)

WillAyd · 2025-06-30T13:05:15Z

The JSON issues stem back to the fact that:

>>> pd.Series([None, '', 'c']).astype(object)

yields different behavior with/without the future string dtype. In the "old" world, this would preserve the value of None but in the new world, None gets cast to a missing value indicator when contained within a series of string values.

In theory we could try and work around those semantics by natively supporting an object type in the JSON reader, but that's a ton of effort and I don't think worth it, given JSON does not natively support object storage

jbrockmendel · 2025-06-30T15:25:23Z

thanks, will update those tests' expecteds

mroeschke · 2025-06-30T17:35:19Z

we don't make series.values read-only for ArrowStringArray

Can't speak for the viability but I think this should be read-only per CoW. Is this related to the underlying ArrowExtensionArray.__setitem__ immutably issue?

jbrockmendel · 2025-06-30T17:45:00Z

Can't speak for the viability but I think this should be read-only per CoW. Is this related to the underlying ArrowExtensionArray.setitem immutably issue?

I suspect we would need a mechanism like ndarray.flags for EAs to declare an object as read-only. Definitely out of scope for this PR.

Looking at the test, the pertinent behavior can be tested by making the column specifically object dtype.

jorisvandenbossche

Thanks a lot Brock, the fixes in the current diff all look good!

jorisvandenbossche · 2025-06-30T19:32:26Z

tests.base.test_misc::test_memory_usage is skipped instead of xfailed, but the reason says that it "doesn't work properly" for arrow strings which seems xfail-adjacent. Instead of skipping can we update the expected behavior cc @jorisvandenbossche ?

I quickly checked that one, and I am not entirely sure why we skipped it based on "doesn't work properly for arrow strings". The issue with the test seems to be that it is assuming too much about the input data, and its is_object definition is no longer correct. For example, a MultiIndex gets considered as object dtype, but now if all levels of the MultiIndex are using str dtype, then the test should not actually see it as is_object.
Same for is_categorical where the test code assumes that a categorical is using object dtype categories, I think.

(can do a separate PR to fix this one, now that I already dived into it)

pandas/tests/io/json/test_pandas.py

mroeschke

We'll want to backport this into 2.3.x?

jbrockmendel · 2025-07-03T18:15:44Z

Looks like #61757 wasn't backported, so i dont think this one should be either

TST: update expecteds for using_string_dtype to fix xfails

91dfbee

Merge branch 'main' into tst-string-xfails

8b64447

jorisvandenbossche approved these changes Jun 30, 2025

View reviewed changes

jbrockmendel added 5 commits June 30, 2025 14:17

Merge branch 'main' into tst-string-xfails

b6d762b

Update to_dict_of_blocks test to hardcode object dtype

3ae2256

Comment

6c8fc89

Split test, update expected, targeted xfails

94deaba

Update json test

6d11c3a

jbrockmendel commented Jun 30, 2025

View reviewed changes

pandas/tests/io/json/test_pandas.py Outdated Show resolved Hide resolved

jbrockmendel added 3 commits June 30, 2025 14:53

revert commented-out

3102045

Merge branch 'main' into tst-string-xfails

b98d48a

Merge branch 'main' into tst-string-xfails

01521dd

jorisvandenbossche mentioned this pull request Jul 1, 2025

TST (string dtype): resolve skip in misc test_memory_usage #61757

Merged

mroeschke approved these changes Jul 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

jbrockmendel commented Jun 27, 2025 •

edited

Loading

Uh oh!

WillAyd commented Jun 30, 2025

Uh oh!

jbrockmendel commented Jun 30, 2025 •

edited

Loading

Uh oh!

mroeschke commented Jun 30, 2025

Uh oh!

jbrockmendel commented Jun 30, 2025

Uh oh!

jorisvandenbossche left a comment

Uh oh!

jorisvandenbossche commented Jun 30, 2025

Uh oh!

Uh oh!

mroeschke left a comment

Uh oh!

jbrockmendel commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

Are you sure you want to change the base?

TST[string]: update expecteds for using_string_dtype to fix xfails #61727

Conversation

jbrockmendel commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd commented Jun 30, 2025

Uh oh!

jbrockmendel commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Jun 30, 2025

Uh oh!

jbrockmendel commented Jun 30, 2025

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jun 30, 2025

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Jul 3, 2025

Uh oh!

Uh oh!

jbrockmendel commented Jun 27, 2025 •

edited

Loading

jbrockmendel commented Jun 30, 2025 •

edited

Loading